"A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E."
--Tom M. Mitchell, Carnegie Mellon University
So if your program is to predict, say, weather on a long weekends (Task T), you can run it through a machine learning algorithm with past data (Experience E) and, if it has successfully "learned", it will be better at predicting weather forcasts (Performance measure P).
Few other most popular definitions are
Ability of a machine to improve its own performance through the use of artificial intelligence techniques in order to mimic the ways of humans seem to learn (repetition and experience)
and
Machine learning is to use generic algorithms with a set of data (Problem specific) without having to write any custom code specific to the problem. Instead, data is feed to those generic algorithm(s) and it builds its own logic based on the provided data
Following are the common ML methods
Classification
means to group the output into a class.In it the program is "trained" using the past event's data (called training data), which then facilitate its ability to predict future events accurately when provided new data. In this algorithm, data is a set of training examples
with the associated "correct answers" and it learns to predict the correct answer from this training set. In other words, it can be said that we have sets of correct/validated questions and answers and using them system tries to find the answer to the asked question.
There are many analysis techniques which are employed, few of them are as follows
Regression analysis is a form of predictive modelling technique which predicts, by iinvestigating the relationship between dependent
and independent
variable(s).
It is manly used in forecasting
, time series modelling
and finding the causal effect relationship
between different variables
. For example, relationship between rash driving and number of road accidents by a driver is best studied through this regression.
It is also an important tool for modelling
and analyzing
data. In it, we fit a curve
/line
to the data points
, in such a manner that the differences between the distances of data points
from the curve or line is minimized.
We will cover it in details in coming sections.
There are many kinds of regression techniques available for predictions and are mostly defined by the following three metrics
Most common regression algorithms
are as follows
Classification is the problem of identifying to which of a set of categories a new observation belongs, on the basis of a training set of data containing observations whose category membership is already defined.
In an unsupervised learning algorithm, the algorithm itself can find trends in the given data, without having to look for some specific "correct answer".
Examples of unsupervised learning algorithms involve clustering (grouping similar data points) or anomaly detection (detecting unusual data points in a data set).
Basket analysis: P (Y | X ) probability that somebody who buys X also buys Y where X and Y are products/services. Example: P ( chips | beer ) = 0.7
The most common steps in creating a solution using Machine Learning are as follows
In this step historic data is collected from various data sources, such as weather station, news archive etc depending on the requirement.
It is a good practice to have large variety, density and volume of relevant data, because better the data better prospects for the machine to predict.
Machine learning is about learning some properties of a data set and applying them to new data. This is why a common practice in machine learning to evaluate an algorithm is to split the data at hand into two sets, one that we call the training set on which we learn data properties and one that we call the testing set on which we test these properties.
In [ ]:
In [ ]:
Deep Learning is associated with Artificial Neural Network
. ANN uses the concept of human brain to facilitate the modeling of arbitrary functions. ANN requires a vast amount of data and this algorithm is highly flexible when it comes to model multiple outputs simultaneously.
In [ ]:
In [ ]:
In [ ]:
Correlation: Correlation is any of a broad class of statistical relationships involving dependence, though in common usage it most often refers to the extent to which two variables have a linear relationship with each other. Examples of dependent phenomena include the correlation between the physical statures of parents and their offspring, and the correlation between the demand for a product and its price.